29 research outputs found
Cross-lingual Emotion Detection
Emotion detection is of great importance for understanding humans.
Constructing annotated datasets to train automated models can be expensive. We
explore the efficacy of cross-lingual approaches that would use data from a
source language to build models for emotion detection in a target language. We
compare three approaches, namely: i) using inherently multilingual models; ii)
translating training data into the target language; and iii) using an
automatically tagged parallel corpus. In our study, we consider English as the
source language with Arabic and Spanish as target languages. We study the
effectiveness of different classification models such as BERT and SVMs trained
with different features. Our BERT-based monolingual models that are trained on
target language data surpass state-of-the-art (SOTA) by 4% and 5% absolute
Jaccard score for Arabic and Spanish respectively. Next, we show that using
cross-lingual approaches with English data alone, we can achieve more than 90%
and 80% relative effectiveness of the Arabic and Spanish BERT models
respectively. Lastly, we use LIME to interpret the differences between models
BeaSku at CheckThat! 2021:Fine-tuning sentence BERT with triplet loss and limited data.
Misinformation and disinformation are growing problems online. The negative consequences of the proliferation of false claims became especially apparent during the COVID-19 pandemic. Thus, there is a need to detect and to track false claims. However, this is a slow and time-consuming process, especially when done manually. At the same time, the same claims, with some small variations, spread simultaneously across many accounts and even on different platforms. One promising approach is to develop systems for detecting new instances of claims that have been previously fact-checked online, as in the CLEF-2021 CheckThat! Lab Task-2b. Here we describe our system for this task. We fine-tuned sentence BERT using triplet loss, and we experimented with two types of augmented datasets. We further combined BM25 scores with language model similarity scores as features in a reranker. The official evaluation results have put our BeaSku system at the second place
That is a Known Lie: Detecting Previously Fact-Checked Claims
The recent proliferation of "fake news" has triggered a number of responses,
most notably the emergence of several manual fact-checking initiatives. As a
result and over time, a large number of fact-checked claims have been
accumulated, which increases the likelihood that a new claim in social media or
a new statement by a politician might have already been fact-checked by some
trusted fact-checking organization, as viral claims often come back after a
while in social media, and politicians like to repeat their favorite
statements, true or false, over and over again. As manual fact-checking is very
time-consuming (and fully automatic fact-checking has credibility issues), it
is important to try to save this effort and to avoid wasting time on claims
that have already been fact-checked. Interestingly, despite the importance of
the task, it has been largely ignored by the research community so far. Here,
we aim to bridge this gap. In particular, we formulate the task and we discuss
how it relates to, but also differs from, previous work. We further create a
specialized dataset, which we release to the research community. Finally, we
present learning-to-rank experiments that demonstrate sizable improvements over
state-of-the-art retrieval and textual similarity approaches.Comment: detecting previously fact-checked claims, fact-checking,
disinformation, fake news, social media, political debate
Automated Fact-Checking for Assisting Human Fact-Checkers
The reporting and analysis of current events around the globe has expanded
from professional, editor-lead journalism all the way to citizen journalism.
Politicians and other key players enjoy direct access to their audiences
through social media, bypassing the filters of official cables or traditional
media. However, the multiple advantages of free speech and direct communication
are dimmed by the misuse of the media to spread inaccurate or misleading
claims. These phenomena have led to the modern incarnation of the fact-checker
-- a professional whose main aim is to examine claims using available evidence
to assess their veracity. As in other text forensics tasks, the amount of
information available makes the work of the fact-checker more difficult. With
this in mind, starting from the perspective of the professional fact-checker,
we survey the available intelligent technologies that can support the human
expert in the different steps of her fact-checking endeavor. These include
identifying claims worth fact-checking; detecting relevant previously
fact-checked claims; retrieving relevant evidence to fact-check a claim; and
actually verifying a claim. In each case, we pay attention to the challenges in
future work and the potential impact on real-world fact-checking.Comment: fact-checking, fact-checkers, check-worthiness, detecting previously
fact-checked claims, evidence retrieva
Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media
We present an overview of the third edition of the CheckThat! Lab at CLEF
2020. The lab featured five tasks in two different languages: English and
Arabic. The first four tasks compose the full pipeline of claim verification in
social media: Task 1 on check-worthiness estimation, Task 2 on retrieving
previously fact-checked claims, Task 3 on evidence retrieval, and Task 4 on
claim verification. The lab is completed with Task 5 on check-worthiness
estimation in political debates and speeches. A total of 67 teams registered to
participate in the lab (up from 47 at CLEF 2019), and 23 of them actually
submitted runs (compared to 14 at CLEF 2019). Most teams used deep neural
networks based on BERT, LSTMs, or CNNs, and achieved sizable improvements over
the baselines on all tasks. Here we describe the tasks setup, the evaluation
results, and a summary of the approaches used by the participants, and we
discuss some lessons learned. Last but not least, we release to the research
community all datasets from the lab as well as the evaluation scripts, which
should enable further research in the important tasks of check-worthiness
estimation and automatic claim verification.Comment: Check-Worthiness Estimation, Fact-Checking, Veracity, Evidence-based
Verification, Detecting Previously Fact-Checked Claims, Social Media
Verification, Computational Journalism, COVID-1
Overview of the CLEF-2022 CheckThat! Lab Task 1 on Identifying Relevant Claims in Tweets
We present an overview of CheckThat! lab 2022 Task 1, part of the 2022 Conference and Labs of the Evaluation Forum (CLEF). Task 1 asked to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics in six languages: Arabic, Bulgarian, Dutch, English, Spanish, and Turkish. A total of 19 teams participated and most submissions managed to achieve sizable improvements over the baselines using Transformer-based models such as BERT and GPT-3. Across the four subtasks, approaches that targetted multiple languages (be it individually or in conjunction, in general obtained the best performance. We describe the dataset and the task setup, including the evaluation settings, and we give a brief overview of the participating systems. As usual in the CheckThat! lab, we release to the research community all datasets from the lab as well as the evaluation scripts, which should enable further research on finding relevant tweets that can help different stakeholders such as fact-checkers, journalists, and policymakers